Irregular Lattices

Dependencies

Code
source("utils.R")
theme_set(theme_minimal())

For this representation of cells, we will rely on the SpatialFeatureExperiment package. For preprocessing of the dataset, we refer the reader to the vignette of the voyager package.

Code
(sfe <- HeNSCLCData())

# Empty cells
colData(sfe)$is_empty <- colData(sfe)$nCounts < 1
# Select, count negative control probes
(neg_inds <- str_detect(rownames(sfe), "^NegPrb")) %>% sum

colData(sfe)$prop_neg <- colSums(counts(sfe)[neg_inds,])/colData(sfe)$nCounts
# Remove low quality cells
sfe <- sfe[,!sfe$is_empty & sfe$prop_neg < 0.1]
# Calculate count stats
rowData(sfe)$means <- rowMeans(counts(sfe))
rowData(sfe)$vars <- rowVars(counts(sfe))
rowData(sfe)$is_neg <- neg_inds
# log Counts
sfe <- logNormCounts(sfe)

In this vignette, we will show the metrics related a ligand-receptor pair, CEACAM6 and EGFR which was identified in the original publication of the CosMx dataset (He et al. 2022).

Code
plotSpatialFeature(sfe, c("CEACAM6"),
                   colGeometryName = "centroids", 
                   ncol = 2, scattermore = TRUE) + 
  theme_void()

Code
plotSpatialFeature(sfe, c("EGFR"),
                   colGeometryName = "centroids", 
                   ncol = 2, scattermore = TRUE) + 
  theme_void()

Code
colGraph(sfe, "knn5") <- findSpatialNeighbors(sfe, method = "knearneigh",
                                                  dist_type = "idw", k = 5, 
                                                  style = "W")
weights_neighbourhoods <- colGraph(sfe, "knn5")

Global Measures for Bivariate Data

Global Bivariate Moran’s I

For two continous observation the global bivariate Moran’s I is defined as (Wartenberg 1985 ; Bivand 2022)

\[I_B = \frac{\Sigma_i(\Sigma_j{w_{ij}y_j\times x_i})}{\Sigma_i{x_i^2}}\]

where \(x_i\) and \(y_i\) are the two variables of interest and \(w_{ij}\) is the value of the spatial weights matrix for positions \(i\) and \(j\).

The global bivariate Moran’s I is a measure of autocorrelation of the variables \(x\) and \(y\) with the spatial lag of \(y\). Therefore the result might overestimate the spatial autocorrelation of the variables due to the inherent (non-spatial) correlation of \(x\) and \(y\) (Bivand 2022).

Implementation using spded

Code
res_xy <- spdep::moran_bv(x = logcounts(sfe)["CEACAM6",],
         y = logcounts(sfe)["EGFR",],
         listw =  colGraph(sfe, "knn5"),
         nsim = 499)
boot::boot.ci(res_xy, conf = c(0.99, 0.95, 0.9), type = "basic")
BOOTSTRAP CONFIDENCE INTERVAL CALCULATIONS
Based on 499 bootstrap replicates

CALL : 
boot::boot.ci(boot.out = res_xy, conf = c(0.99, 0.95, 0.9), type = "basic")

Intervals : 
Level      Basic         
99%   ( 0.1280,  0.1365 )   
95%   ( 0.1293,  0.1353 )   
90%   ( 0.1300,  0.1348 )  
Calculations and Intervals on Original Scale
Some basic intervals may be unstable
Code
plot(res_xy)

From the result of the global measure, the overall spatial autocorrelation of the two genes is not significant.

Global Bivariate Lee’s L

Lee’s L is a bivariate measure that combines non-spatial pearson correlation with spatial autocorrelation via Moran’s I (Lee 2001). This enables us to asses the spatial dependence of two continuous variables in a single measure. The measure is defined as

\[L(x,y) = \frac{n}{\sum_{i=1}^n(\sum_{j=1}^nw_{ij})^2}\frac{\sum_{i=1}^n(\sum_{j=1}^nw_{ij}(x_i-\bar{x}))(\sum_{j=1}^nw_{ij}(y_j-\bar{y}))}{\sqrt{\sum_{i=1}^nw_{ij}(x_i-\bar{x})^2}\sqrt{\sum_{i=1}^nw_{ij}(y_i-\bar{y})^2}}, \]

where \(w_{ij}\) is the value of the spatial weights matrix for positions \(i\) and \(j\), \(x\) and \(y\) the two variables and \(\bar{x}\) and \(\bar{y}\) their means.

Implementation using spded

Code
res_lee <- calculateBivariate(sfe, type = "lee.mc", 
                   feature1 = "CEACAM6", feature2 = "EGFR",
                   nsim = 499)
res_lee$lee.mc_statistic
 statistic 
0.06726344 
Code
res_lee$ lee.mc_p.value
[1] 0.002

The interpretation of the result is not straightforward, as identical patterns do not have a value of 1 but have a value depending on their spatial arrangement. In general positive values indicate spatial co-occurrence while negative values indicate spatial segregation of the pattern of variables of \(x\) and \(y\) (Lee 2001).

Local Measures for Bivariate Data

Bivariate Lee’s L

Similar to the global variant of Lee’s L the local variant (Lee 2001) is defined as

\[L_i(x,y) = \frac{\sum_{i=1}^n(\sum_{j=1}^nw_{ij}(x_i-\bar{x}))(\sum_{j=1}^nw_{ij}(y_j-\bar{y}))}{\sqrt{\sum_{i=1}^nw_{ij}(x_i-\bar{x})^2}\sqrt{\sum_{i=1}^nw_{ij}(y_i-\bar{y})^2}}, \] Local Lee’s L is a measure of spatial co-expression, when the variables of interest are gene expression measurements and can also be a metric of co-localization. Unlike the gobal version, the variables are not averaged and show the local contribution to the metric. Positive values indicate colocalization, negative values indicate segregation.

This can be interesting in the context of detection of coexpressed ligand-receptor pairs. A method that is based on bivariate Moran’s I and tries to detect such pairs is SpatialDM (Li et al. 2023).

Implementation using voyager

Code
sfe <- runBivariate(sfe, type = "locallee",
                    feature1 = "CEACAM6", feature2 = "EGFR")

plotLocalResult(sfe, "locallee", 
                features = localResultFeatures(sfe, "locallee"),
                ncol = 2, colGeometryName = "centroids",
                divergent = TRUE, diverge_center = 0) 

Local Measures for Multivariate Data

Multivariate local Geary’s C

Geary’s C is a measure of spatial autocorrelation that is based on the difference between a variable and its neighbours. (Anselin 2019) defines it as

\[C_i = \sum_{j=1}^n w_{ij}(z_i-z_j)^2,\]

and can be generalized to \(k\) parameters by expanding

\[c_{k,i} = \sum_{v=1}^k c_{v,i}\]

where \(c_{v,i}\) is the local Geary’s C for the \(v\)th variable at location \(i\). The number of variables that can be used is not fixed, which makes the interpretation a bit more difficult. In general, the metric summarizes similarity in the “multivariate attribute space” (i.e. the gene expression) to its geographic neighbours. The common difficulty in these analyses is the interpretaing the mixture of similarity in the geographic space and similarity in the attribute space.

Here we use the marker genes used in the original paper that study apoptosis (He et al. 2022, Supplementary Table 1) in a subset of the tissue, as the computation is very intensive.

Code
apoptosis_genes <- c("BAX", "BCL2", "BCL2L1", "BIRC5", "CASP3", "CASP8", "TP53")

# Subset of the tissue
bbox_use <- st_as_sfc(st_bbox(c(xmin = 3200, xmax = 15800, ymin = 155200, ymax = 164200)))
sfe_sub <- sfe[,st_intersects(colGeometries(sfe)$centroids, bbox_use, sparse = FALSE)]

sfe_sub <- runMultivariate(sfe_sub, type = "localC_multi",
                    subset_row = apoptosis_genes)

# Local C mutli is stored in colData so this is a workaround to plot it
plotSpatialFeature(sfe_sub, "localC_multi", colGeometryName = "centroids")

The plot indicates regions where the gene expression is more homogeneous (low Geary’s C) and regions where the gene expression is more heterogeneous (large Geary’s C value). Importantly, strong similarity in one or some variables may compensate for dissimilarity in other variables, which makes the interpretation tricky. The local Geary’s C value is not scaled.

We can further plot the results of the permutation test. Significant values indicate interesting regions, but should be interpreted with care for various reasons. For example, we are looking for similarity in a combination of multiple values but the exact combination is not known. For further details, see (Anselin 2019).

(MR: this makes me question how we should present this. SG: Good question, it comes very close to spatial clustering here, maybe we can show this as a preview for other methods that we do not discuss.. From the Anselin 2019 publication: “Overall, however, the statistic indicates a combination of the notion of distance in multi-attribute space with that of geographic neighbors. This is the essence of any spatial autocorrelation statistic. It is also the trade-off encountered in spatially constrained multivariate clustering methods (for a recent discussion, see, e.g., Grubesic, Wei, and Murray 2014).”)

Code
sfe_sub <- runMultivariate(sfe_sub, type = "localC_perm_multi",
                    subset_row = apoptosis_genes,
                    nsim = 100)

# stored as spatially reduced dim; plot it in this way
spatialReducedDim(sfe_sub, "localC_perm_multi",  c(1, 11))

It would be interesting to do this not only on the cell level, but also on the domain level.

Local Neighbor Match Test

This test is useful to assess the overlap of the k-nearest neighbours from physical distances (tissue space) with the k-nearest neighbours from the gene expression measurements (attribute space).

[MR: how are the KNNs calculated in the attribute space?]

Code
sf <- colGeometries(sfe)$cellSeg
sf <- cbind(sf,  t(as.matrix(logcounts(sfe)[c(apoptosis_genes),])))

nbr_test <- neighbor_match_test(sf[c(apoptosis_genes)], k = 20)

sf$Probability <- nbr_test$Probability
sf$Cardinality <- nbr_test$Cardinality

tm_shape(sf) + tm_fill(col = 'Probability')  

Code
tm_shape(sf) + tm_fill(col = 'Cardinality')  

Cardinality is a measure of how many neighbours of each cell are in common. Some regions show high cardinality with low probability. On the cellular level, this might not be very informative and lead to regions with high similarity. We only see very few cells with a cardinality greater than 0. The problem might come from the imperfect segmentation of the cells, which can result in a poor reconstruction of the “geographical” tissue neighbourhood graph. In addition, as the number of dimensions increase, the empty space between observations also increases; this is known as the empty space problem.

Again, this analysis should probably be performed on a structure level instead of a single cell level.

[MR: should we do the analysis at the structure/domain level then?]

Appendix

Session info

Code
sessionInfo()
R version 4.3.1 (2023-06-16)
Platform: aarch64-apple-darwin20 (64-bit)
Running under: macOS Monterey 12.6.1

Matrix products: default
BLAS:   /Library/Frameworks/R.framework/Versions/4.3-arm64/Resources/lib/libRblas.0.dylib 
LAPACK: /Library/Frameworks/R.framework/Versions/4.3-arm64/Resources/lib/libRlapack.dylib;  LAPACK version 3.11.0

locale:
[1] en_US.UTF-8/en_US.UTF-8/en_US.UTF-8/C/en_US.UTF-8/en_US.UTF-8

time zone: Europe/Zurich
tzcode source: internal

attached base packages:
[1] stats4    stats     graphics  grDevices utils     datasets  methods  
[8] base     

other attached packages:
 [1] magrittr_2.0.3                 stringr_1.5.0                 
 [3] dixon_0.0-8                    splancs_2.01-44               
 [5] spdep_1.2-8                    spData_2.3.0                  
 [7] tmap_3.3-4                     scater_1.28.0                 
 [9] scran_1.28.2                   scuttle_1.10.3                
[11] SFEData_1.2.0                  SpatialFeatureExperiment_1.2.3
[13] Voyager_1.2.7                  rgeoda_0.0.10-4               
[15] digest_0.6.33                  ncf_1.3-2                     
[17] sf_1.0-14                      reshape2_1.4.4                
[19] patchwork_1.1.3                STexampleData_1.8.0           
[21] ExperimentHub_2.8.1            AnnotationHub_3.8.0           
[23] BiocFileCache_2.8.0            dbplyr_2.3.4                  
[25] RANN_2.6.1                     seg_0.5-7                     
[27] sp_2.1-1                       rlang_1.1.1                   
[29] ggplot2_3.4.4                  dplyr_1.1.3                   
[31] mixR_0.2.0                     spatstat_3.0-6                
[33] spatstat.linnet_3.1-1          spatstat.model_3.2-6          
[35] rpart_4.1.19                   spatstat.explore_3.2-3        
[37] nlme_3.1-162                   spatstat.random_3.1-6         
[39] spatstat.geom_3.2-5            spatstat.data_3.0-1           
[41] SpatialExperiment_1.10.0       SingleCellExperiment_1.22.0   
[43] SummarizedExperiment_1.30.2    Biobase_2.60.0                
[45] GenomicRanges_1.52.1           GenomeInfoDb_1.36.4           
[47] IRanges_2.34.1                 S4Vectors_0.38.2              
[49] BiocGenerics_0.46.0            MatrixGenerics_1.12.3         
[51] matrixStats_1.0.0             

loaded via a namespace (and not attached):
  [1] spatstat.sparse_3.0-2         bitops_1.0-7                 
  [3] httr_1.4.7                    RColorBrewer_1.1-3           
  [5] tools_4.3.1                   utf8_1.2.3                   
  [7] R6_2.5.1                      HDF5Array_1.28.1             
  [9] mgcv_1.8-42                   rhdf5filters_1.12.1          
 [11] withr_2.5.1                   gridExtra_2.3                
 [13] leaflet_2.2.0                 leafem_0.2.3                 
 [15] cli_3.6.1                     labeling_0.4.3               
 [17] proxy_0.4-27                  R.utils_2.12.2               
 [19] dichromat_2.0-0.1             scico_1.5.0                  
 [21] limma_3.56.2                  rstudioapi_0.15.0            
 [23] RSQLite_2.3.1                 generics_0.1.3               
 [25] crosstalk_1.2.0               Matrix_1.5-4.1               
 [27] ggbeeswarm_0.7.2              fansi_1.0.5                  
 [29] abind_1.4-5                   R.methodsS3_1.8.2            
 [31] terra_1.7-55                  lifecycle_1.0.3              
 [33] yaml_2.3.7                    edgeR_3.42.4                 
 [35] rhdf5_2.44.0                  tmaptools_3.1-1              
 [37] grid_4.3.1                    blob_1.2.4                   
 [39] promises_1.2.1                dqrng_0.3.1                  
 [41] crayon_1.5.2                  lattice_0.21-8               
 [43] beachmat_2.16.0               KEGGREST_1.40.1              
 [45] magick_2.8.0                  pillar_1.9.0                 
 [47] knitr_1.44                    metapod_1.7.0                
 [49] rjson_0.2.21                  boot_1.3-28.1                
 [51] codetools_0.2-19              wk_0.8.0                     
 [53] glue_1.6.2                    vctrs_0.6.4                  
 [55] png_0.1-8                     gtable_0.3.4                 
 [57] cachem_1.0.8                  xfun_0.40                    
 [59] S4Arrays_1.0.6                mime_0.12                    
 [61] DropletUtils_1.20.0           units_0.8-4                  
 [63] statmod_1.5.0                 bluster_1.10.0               
 [65] interactiveDisplayBase_1.38.0 ellipsis_0.3.2               
 [67] bit64_4.0.5                   filelock_1.0.2               
 [69] irlba_2.3.5.1                 vipor_0.4.5                  
 [71] KernSmooth_2.23-21            colorspace_2.1-0             
 [73] DBI_1.1.3                     raster_3.6-26                
 [75] tidyselect_1.2.0              bit_4.0.5                    
 [77] compiler_4.3.1                curl_5.1.0                   
 [79] BiocNeighbors_1.18.0          DelayedArray_0.26.7          
 [81] scales_1.3.0                  classInt_0.4-10              
 [83] rappdirs_0.3.3                goftest_1.2-3                
 [85] spatstat.utils_3.0-3          rmarkdown_2.25               
 [87] XVector_0.40.0                htmltools_0.5.6.1            
 [89] pkgconfig_2.0.3               base64enc_0.1-3              
 [91] sparseMatrixStats_1.12.2      fastmap_1.1.1                
 [93] htmlwidgets_1.6.2             shiny_1.7.5.1                
 [95] DelayedMatrixStats_1.22.6     farver_2.1.1                 
 [97] jsonlite_1.8.7                BiocParallel_1.34.2          
 [99] R.oo_1.25.0                   BiocSingular_1.16.0          
[101] RCurl_1.98-1.12               GenomeInfoDbData_1.2.10      
[103] s2_1.1.4                      Rhdf5lib_1.22.1              
[105] munsell_0.5.0                 Rcpp_1.0.11                  
[107] ggnewscale_0.4.9              viridis_0.6.4                
[109] stringi_1.7.12                leafsync_0.1.0               
[111] zlibbioc_1.46.0               plyr_1.8.9                   
[113] parallel_4.3.1                ggrepel_0.9.4                
[115] deldir_1.0-9                  Biostrings_2.68.1            
[117] stars_0.6-4                   splines_4.3.1                
[119] tensor_1.5                    locfit_1.5-9.8               
[121] igraph_1.5.1                  ScaledMatrix_1.8.1           
[123] BiocVersion_3.17.1            XML_3.99-0.14                
[125] evaluate_0.22                 BiocManager_1.30.22          
[127] httpuv_1.6.11                 purrr_1.0.2                  
[129] polyclip_1.10-6               scattermore_1.2              
[131] rsvd_1.0.5                    lwgeom_0.2-13                
[133] xtable_1.8-4                  e1071_1.7-13                 
[135] RSpectra_0.16-1               later_1.3.1                  
[137] viridisLite_0.4.2             class_7.3-22                 
[139] tibble_3.2.1                  memoise_2.0.1                
[141] beeswarm_0.4.0                AnnotationDbi_1.62.2         
[143] cluster_2.1.4                

References

Anselin, Luc. 2019. “A Local Indicator of Multivariate Spatial Association: Extending Geary’s c.” Geographical Analysis 51 (2): 133–50. https://doi.org/10.1111/gean.12164.
Bivand, Roger. 2022. “R Packages for Analyzing Spatial Data: A Comparative Case Study with Areal Data.” Geographical Analysis 54 (3): 488–518. https://doi.org/10.1111/gean.12319.
He, Shanshan, Ruchir Bhatt, Carl Brown, Emily A. Brown, Derek L. Buhr, Kan Chantranuvatana, Patrick Danaher, et al. 2022. “High-Plex Imaging of RNA and Proteins at Subcellular Resolution in Fixed Tissue by Spatial Molecular Imaging.” Nature Biotechnology 40 (12): 1794–1806. https://doi.org/10.1038/s41587-022-01483-z.
Lee, Sang-Il. 2001. “Developing a Bivariate Spatial Association Measure: An Integration of Pearson’s r and Moran’s I.” Journal of Geographical Systems 3 (4): 369–85. https://doi.org/10.1007/s101090100064.
Li, Zhuoxuan, Tianjie Wang, Pengtao Liu, and Yuanhua Huang. 2023. SpatialDM: Rapid Identification of Spatially Co-Expressed Ligand-Receptor Reveals Cell-Cell Communication Patterns.” bioRxiv. https://doi.org/10.1101/2022.08.19.504616.
Wartenberg, Daniel. 1985. “Multivariate Spatial Correlation: A Method for Exploratory Geographical Analysis.” Geographical Analysis 17 (4): 263–83. https://doi.org/10.1111/j.1538-4632.1985.tb00849.x.